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Abstract 

Ever since WMAP announced its first results, different analyses have shown that there is weak 
evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly 
appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult 
to account for them via a single statistical fluke. So, one is led to considering a combination of these 
anomalies. But, if we “hand-pick” the anomalies (test statistics) to consider, we are making an a 
posteriori choice. In this article, we propose two statistics that do not suffer from this problem. 
The statistics are linear and quadratic combinations of the a^m’s with random co-efficients, and 
they test the null hypothesis that the aim’s are independent, normally-distributed, zero-mean 
random variables with an m-independent variance. The motivation for such statistics is generality; 
equivalently, it is a non a posteriori choice. But, a very useful by-product of considering such 
statistics is this; Because most physical models that lead to large-scale anomalies result in coupling 
multiple I and m modes, the “coherence” of this coupling should get enhanced if a combination 
of different modes is considered. Using fiducial data, we demonstrate that the method works and 
discuss how it can be used with actual CMB data to make quite general statements about how 
incompatible the data are with the null hypothesis. 
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I. INTRODUCTION 


A dramatic increase in the amount of observed data has, over the last couple of decades, 
led to a much better understanding of the Universe we inhabit. In fact, the cosmology 
community is so conhdent about the standard paradigm that the paradigm is referred to as 
the Standard (or Concordance) Model, after the Standard Model of Particle Physics. Seven 
or eight parameters, along with general relativity and elementary quantum mechanics, are 
sufficient to explain a host of observations on the largest scales, once initial conditions are set 
deep in the radiation era. Standard held quantization techniques applied to cosmic inhation 
have been remarkably successful in explaining these initial conditions even. The cosmology 
being studied today is called Precision Cosmology because parameters have been determined 
to percent-level precision BE]. 

But, as is well-known, there is a difference between precision and accuracy. Questions 
abound over some of the postulates of the Concordance Model. Because we have access to 
only one universe, the usual method of testing postulates by repeating experiments cannot 
be carried out. As inhation postulates that the primordial seeds of the universe’s structure 
themselves arise out of a stochastic process, this inability to repeat experiments is an even 
bigger handicap. 

The cosmic microwave background (CMB) has turned out to be the cosmologist’s most 
useful aid in understanding what has happened in the universe from just a few minutes after 
the Big Bang, all the way up to the present. Since most CMB photons have travelled to us 
without any scattering, they represent a very faithful picture of the universe when it was 
about 400,000 years old. Moreover, at the scales relevant to us today, the density perturba¬ 
tions were small enough that linear perturbation theory is an excellent approximation. This 
implies that the statistical properties of the primordial fluctuations were preserved all the 
way to the surface of last scattering, and thence to us today. 

In vanilla models of inflation, the Fourier modes of the primordial fluctuations have the 
same dynamics as harmonic oscillators in their ground state, and are thus distributed as 
Gaussians Moreover, statistical homogeneity and isotropy imply that the variance of this 
Gaussian distribution doesn’t depend on the direction of the wavenumber of the Fourier 

^ We ignore non-Gaussianities of the kind calculated by Maldacena [3] as they are highly suppressed. 
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mode, and that the variance is the same for the real and the imaginary part of the Fourier 
modes |1]. In 2013, Planck announced that the CMB data put very strong constraints on the 
amount of non-Gaussianity in the primordial power spectrum |S]. In effect, this meant that 
several exotic inflationary models got ruled out with very high probability. So, the accepted 
wisdom is that the Fourier modes of the initial density perturbations are independent and 
distributed as Gaussians. 

The only challenge to this postulate of independent, normally distributed perturbations 
probably has to do with the so-called GMB anomalies. The GMB anisotropies across the 
sky are usually expressed in terms of a^^’s, the co-efficients corresponding to the spherical 
harmonics Expressing Fourier modes in terms of the spherical harmonics, and using 

the results from the previous paragraph, we are led to conclude that most viable inflationary 
models predict that the aim’s are normally distributed with zero mean and with a variance 
Cl (hence independent of m). 

When WMAP announced [U] its hrst set of results, the authors in [7j and [H] analysed 
the aim’s to test this hypothesis. They employed a variety of tests and found weak evidence 
for correlation amongst the aim’s that corresponded to the largest scales (low-^’s). The 
anomalies reported dealt with the alignment of different multipoles and how planar a few of 
these multipoles were. Several authors [MI] performed similar analyses and again found 
weak evidence. A different kind of anomaly, having to do with a low value for the variance in 
the GMB sky, was observed by [T2| for the WMAP data. The authors in [TH1 - IT3] considered 
the isotropy of the angular power spectrum and concluded that it appears to be anisotropic. 
A few more anomalies were reported in [T6lfl8] . amongst other works. 

Apart from the weak evidence, two arguments were preferred questioning the “real” 
nature of these anomalies: one, that they arose from the systematics that WMAP employed; 
two, that these anomalies were checked for a posteriori. So, now that Planck has con&med 
that most of the anomalies are present in their data too [12], one may reasonably argue 
that the anomalies are a bona fide feature of the GMB. The question remains as to whether 
this feature is physically relevant or not. As Planck also concluded that there is only weak 
evidence for these anomalies, this question has not been settled convincingly. Many authors 
contend [2^, with good reason, that given a large enough dataset, one can always hnd any 
feature that one desires. Gompounding the problem of the large dataset is the fact that 
the anomalies have been observed for low-£’s—it is here that the effect of cosmic variance is 
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most pronounced. This makes statistical inferences about the anomalies even more dubious. 
Also, there is the perennial question of foreground contamination—without a reliable model 
for galactic dust, it isn’t clear how accurate the determination of the is. (Though, with 
the availability of multiple probes and multiple frequency channels, this is less of an issue 
IZH than it used to be.) 

But, the fact remains that there are many anomalies with weak evidence. Some of them 
are so apparently different from the rest that, at the outset, it is seems hard to believe 
that they all arose from a common statistical fluke. And, the anomalies seem to be present 
“coherently” across different €s too, seemingly making it harder to believe that it is the 
consequence of a fluke. 

This paper tries to address the second of the arguments put forth against the existence 
of the anomalies—that the tests are all a posteriori. We propose two statistics that test the 
null hypothesis that the are independent, normally distributed zero-mean variables. As 
we shall show, these statistics are such that one cannot reasonably be accused of performing 
the analysis after “seeing” the anomalies in the data. The point is to perform as general an 
analysis of the data as possible, without worrying about whether a test statistic is physically- 
motivated or high-confidence-interval motivated. We shall achieve this by not arbitrarily 
choosing the ^’s and the m’s to analyse; instead, we consider their linear and quadratic 
combinations. For one, this makes the analysis more general; but, crucially, if the anomalies 
are physical, it is very unlikely that they arose because of a coupling between just two or three 
a^m’s. This anomalous nature must be present for a range of modes and thus considering 
combinations of the modes should lead to an enhancement of the signal. Also, previous 
analyses of CMB anomalies have involved several Monte Carlo simulations to produce a 
reference set of Gaussian sky maps. And, one gets several p-values as different statistics are 
considered. In our case, once a maximum ^ value is chosen, one gets a single p-value for 
each of the two statistics considered. 

Of course, the real interest lies in applying this test to actual data. But, in this article, we 
restrict ourselves to first demonstrating that this test actually works. This is partly due to 
the fact that this is a novel method and we would like to introduce it and test its usefulness; 
it is also due to the fact that different missions and different foreground subtractions have 
led to a plethora of maps. Hopefully, in a subsequent article, we will apply the proposed test 
in a manner consistent with the multiplicity of the available maps. Also, a Planck release 
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of the polarization data is expected in the near future. It would be interesting to test the 
statistical properties of this too, along with that of the scalar modes. 


II. Y A LINEAR STATISTIC 

A. Motivation 

In broad terms, the way a hypothesis is statistically tested is this: Assume that a given 
dataset is described by a known probability distribution Pi; formulate a statistic that is a 
function of the corresponding random variables; determine the expected distribution P 2 of 
this statistic, assuming the fiducial distribution Pi; see how compatible the actual (realized) 
value of the statistic using the given dataset is, with the distribution P 2 . If the compatibility 
is very low, then one concludes that the data are inconsistent with the hypothesis.^ It is 
clear, however, that the conclusion strongly depends on the statistic chosen. Ideally, one 
would like to do the analysis for several different statistics. 

Let us look at linear test statistics; that is, if an n-component vector X describes n 
variables of a dataset, then consider S = a ■ X, where each choice of the constant co¬ 
efficients a would correspond to one statistic. If one wants to do a blind analysis of the data, 
one is tempted to consider several different choices of a —for instance, by making a itself a 
random vector. If one knows the underlying distribution of a, and the null hypothesis for 
the distribution of X, then one may hope to determine the distribution of S. In general, 
this distribution would be quite complicated. In the next section, we show that for a specihc 
choice of the distribution of a, and a specihc null hypothesis, the distribution of S becomes 
very simple. 

B. The Y Statistic 

Let a be an A-component random variable vector, with each component being described 
by a zero-mean normal distribution. A/” (0, af). Let X be another A-component vector with 

^ This is more of a goodness-of-fit test than a hypothesis test because we are not specifying an alternative 
hypothesis. But, the former can be thought of as a special case of the latter, where the alternative 
hypothesis is that the data are not described by the null hypothesis. 
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each of its components being described by M (0, /?f). Further, assume that of = l//3f. That 
is, the combined probability distribution function is 

P{a,X) = -^^^(detS^ detS^)”^^^ exp exp , (1) 

where Sq, and Sx are diagonal matrices with = (Sx)~-^ = ctfSij. 

Consider a random variable arising out of these two random variables, 

Y = a-X = ttiX^ (2) 

In the above, Einstein’s summation convention is implied. Though both a and X are random 
variables, we shall eventually consider the case where there is only one realization of X. That 
is, the two random variables must not be considered to be on the same footing. We shall hrst 
treat X as a constant vector, carry out all operations with respect to a and hnally promote 
X to a random vector and carry out operations with respect to it. This shall become more 
clear when we apply it to the case of Cosmology. 

For a given realization of X, E is a linear combination of the independent normal variables 
a. Hence, Y is normally distributed too: 


r ~ AT (0, alX^ + ■ ■ ■ + a^X^) := Af (O, a^) 


(3) 


This is for a given realization of the X*’s. But, the X*’s themselves are random variables 
with an underlying distribution. Thus, we may ask how cr^ is distributed. Because = 
1//9?, that is, the reciprocal of the variance of X*, is the sum of squares of N normally 
distributed random variables with zero mean and unit variance. Hence, follows a Chi- 
squared distribution with N degrees of freedom, ~ x^(X). To calculate the probability 
distribution of Y, that is, P{Y = y), we need to marginalize over this x^(X) because the 
variance is now a random variable: 


P{Y = y) 
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where K is the modified Bessel function of the second kind.^ 

As the dependence of Y is only on N, one may wonder where the distributions of a and 
X enter. It is only because of the choice of the variances of the distribution, aj = l//3f, 
that the dependence on the details of the distribution “cancels out”. So, as promised earlier, 
we have shown that a specific choice of the distribution for the co-efficients (a in this case) 
results in a very simple form for the distribution of the statistic. 

Usually, the word (test) statistic is reserved for a function of the data. In particular, for 
each set of data, such a statistic returns a single number. In our case, by construction, the 
Y statistic is not a single number because a given X is multiplied by several a. We shall 
call such statistics vector statistics. 

C. Realizations 

In cosmology, we have only one realization of the Universe. For our purposes, this trans¬ 
lates into one realization of the a^m’s, which we take to be the real multipole co-efficients 
corresponding to the real spherical harmonics Wm’s (see, for instance. Appendix A of iza). 
The index m then ranges from [—£,£]. But, the Concordance Model of Cosmology predicts 
that each aim is a random variable, arising from a Gaussian distribution 7V(0, Ci). This can 
be thought of as the null hypothesis Hq. 

Thus, the aim’s are like our X and we shall refer to them as X, in order to keep matters 
general. One way of testing Hq is by considering different test statistics of X and seeing if 
their realized values are compatible with that predicted by the null hypothesis. One trouble 
with this method is that X doesn’t have a basis-independent definition—its meaning depends 
on the coordinate system employed in the sky. Further, the p-values that one derives depend 
fundamentally on the test statistic chosen. So, just because one such p-value is compatible 
with the null hypothesis doesn’t mean that the data are. 

In our case, note that the Y statistic is independent of the co-ordinate system chosen in 
the sky. To see this, consider an active rotation of the co-ordinate system. It can be shown 

^ This distribution can be considered to be the generalization of the well-known distribution of the random 
variable that is the product of two standard Gaussian variables. The latter corresponds to the N = 1 case 
of the former. 
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that the transformed X, say X', is related to X by a real orthogonal matrix^, say TZ] that is, 
X' = TZ ■ X . The Y statistic arising ont of X', say Y' = ai(X')* = aiTZ\X^ = TZj^^UiX^ := 
(a')jXh Using Q and TZ^TZ = 1, it is clear that the PDFs for ai and {a')i are the same and 
hence the Y statistic is co-ordinate system independent. 

Now that we have discussed the test statistic and its properties, let us detail our moti¬ 
vation for considering this statistic and what we intend to do with it. One might wonder 
why a linear combination of the components of X is being considered. This has to do with 
the kind of anomalies that are usually discussed. It is very natural to assume that these 
anomalies are the result of some correlation between the different components of X. Indeed, 
many models that attempt to explain these anomalies posit precisely such a correlation (see 
[2S] and references therein for a review of the anomalies and some proposed explanations for 
their origins). The only way to test these correlations is by considering functions that “mix” 
the different components. A linear superposition is just the simplest of these functions. We 
shall consider second-order statistics in due course. 

We now consider a more operational dehnition of X. We specialize to the case where x, 
the realization of X, is the set of That is Xi = 02 - 2 , X 2 = 02 ,- 1 , ■ ■ ■ ,^5 = 02 , 2 ; Xq = 

a 3 ^_ 3 ,... ;X]\f = 0£n,ax,-^max-^ Here, f'max is the largest i value that we go up to: 

£Lx + 2W-(3 + X) = 0 (5) 

The strategy is the following: Under the null hypothesis Hq, we have the distribution for 
Y, given in (|^. From CMB experiments such as WMAP and Planck, we have the realized 
values of X in the actual sky. We use these realized values of X, Xgky, and determine the 
distribution of Y, P{ysky)- We can compare this distribution with (|^ and can then infer 
the compatibility of CMB data with Hq. 

III. HYPOTHESIS TESTING 

Usually, hypothesis testing involves calculating the probability of the realized value of a 
statistic, given the distribution of the statistic under the assumption of the null hypothesis. 

^ The transformation matrix is given by C*I?C ^ |2Hj . where C is a matrix that relates the complex spherical 
harmonics to the real ones, and V is the Wigner D-matrix [21] that describes how complex spherical 
harmonics transform under rotations. Both matrices are unitary and * denotes complex conjugation. 

® As is usual in CMB analyses, we ignore the monopole and the dipole {£ = 0 and £ = 1). 



This procedure cannot be directly implemented in our approach because, by construction, 


our test statistic Y doesn’t yield a single number for a given dataset - it is a vector statistic. 
So, whereas in the usual case we only have to compare one realized value of the test statistic 


with the expected value, in our case, by its very nature, we must compare the realized 
distribution P{ysky) with that in Q. 

Now, there is no unique way of comparing two arbitrary distributions. As we are basi¬ 
cally looking for a measure of goodness-of-£t, we could consider a chi-squared test. But, 
chi-squared tests are more useful in circumstances where one is estimating the parameters in 
a given model. In that case, minimising chi-squared leads to the best-fit parameters. That 
is not what we are doing here. We are actually comparing data with a fiducial distribution 
function. Moreover, using the chi-squared test involves binning the data, and some infor¬ 
mation is lost in this process. It would be more desirable to work with tests that use the 
data themselves, not bins of data. 

Different such tests have been proposed in the literature, and we shall adopt the Anderson- 
Darling (A-D) test [2B], which we shall describe shortly. The reason for the choice is that 
studies IZ71 have shown that, for a variety of distributions, this test is more powerful than 
others such as the more commonly used Kolmogorov-Smirnov (K-S) test. A possible draw¬ 
back of using the A-D test instead of, say, the K-S test is that the critical values depend 
on the distribution corresponding to the null hypothesis, but, because we know the form 
of this distribution (|^, the critical values can be calculated. Moreover, this dependence 
on the distribution is reflective of the fact that the A-D test is much more sensitive to the 
underlying distribution than the K-S test, and hence more powerful. 

A. Anderson-Darling Test 

Let K be a random variable and let the null hypothesis be that the (continuous) prob¬ 
ability distribution F{V) describes this variable. Further, let the m-component vector vt 
represent m samples of V, sorted in increasing order. Define ^{w) to be the cumulative 
distribution function. 
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Also, define 


^ lib 

^ ^ m ~ + log[l - <^>(Wi-i)]) (6) 

i=l 

The A-D statistic is then given by 

A^ = -m-S (7) 

For well-known distribntions, snch as the normal distribntion, critical valnes of statistic 
have been calculated in the literature. Associated with each critical value is a p-value, with 
which the null hypothesis can be rejected at the corresponding signihcance. For example, 
a value of more than 3.857 would mean rejecting the null hypothesis that the data are 
described by a normal distribution with a given mean and variance at the 1% level.® 

As our distribution Q is not one of the common distributions (the earliest reference to 
it that we could hnd is in [2S]), published critical values for the A-D test do not exist. But, 
for a given N, we can determine them simply by generating a large number of realizations 
drawn from (|^, calculating the corresponding value of A^, and repeating this procedure a 
sufficient number of times. This would give us the distribution of for (|^, from which the 
critical values can be calculated. Call this distribution Ty(A^,iV). 

A peculiar feature arises out of the fact that we only have access to one realization of the 
a^m’s. For typical PDFs, the distribution of A^ in ([^ asymptotes fairly quickly to a fixed 
distribution as the number of realizations (m in the equation) increases. But, recall that we 
have only one realization of X. So, even if we increase the number of V statistics generated 
(thereby increasing the corresponding m), this is not equivalent to an ergodic sampling of 
the distribution. In particular, if we choose, say, m = 10®, then, it does matter whether we 
generate m realizations of V by choosing 10® realizations of a and 1 realization of X, or by 
choosing 10^ realizations of a and 10^ realization of X. Thus, it turns out that in our case 
the distribution of for a given N, what we called \ky(A^,A^), depends on m. We shall 
denote this distribution by \ky(A^, iV, m). Implicit in this notation is the fact that we are 
choosing only one realization of X. 

® This is in the limit of infinite data, and for data that have been standardised (subtract the mean from the 
data, and divide by the standard deviation), though, for the case of the normal distribution, modihcations 
for finite m exist. 
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FIG. 1: PDF for '^Y{A^,N,m) for m = 10^ N = 672. 


For a given N and m, we can determine \E'y(y4^, A^, m) numerically by simply following 
the procedure outlined in (|^: We choose both a and X to be Wdimensional normal vectors 
with zero mean and unit variance. We pick one realization of X and m realizations of 
a, calculate the corresponding Y and one corresponding realization of 'hy iV, m). We 
repeat this procedure several times until we have mapped out the distribution d'y N, m) 
reasonably well. This distribution for N = 672 and m = 10® is shown in Figure 

Once we have determined Ty iV, m), putting limits on how anomalous the data are, 
in terms of our formalism, is relatively straightforward. We have discussed in the previous 
section how we can generate a given number (say, m) of the Y statistic. On sorting this data 
vector in increasing order, we can proceed to calculate the realized value of with the 
distribution in (j^ corresponding to FiV) above. Call this a^^y. This value can be compared 
with N,m) and a p-value can be calculated. 

IV. Z—A QUADRATIC STATISTIC 

A. Ensemble 

In the previous section, we considered a linear combination of the different aim’s. We 
mentioned that if the anomalies are real, then models that could produce these anomalies 
without correlations amongst the different aim’s would likely be very contrived. So, to probe 
these correlations better, it is natural to consider test statistics that are second order in the 
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a^m’s. We shall do that here and revert to using X to denote the ordered set of aim’s. 
Consider the following test statistic: 


Z 


B12X1X2 + -B34X3X4 + ■ ■ ■ + i?Ar_iArXAr_iXAr 


( 8 ) 


For now, assume that N is even, so that this dehnition always makes sense {N is even for 
an odd ^max)- We shall comment on dealing with an odd N later. 

Bij is a random variable distributed as W (O, , where Recall that is 

the variance of the normally distributed Xj. Note that we are using Xi itself as a parameter 
describing a distribution. (Compare this with the distribution of a, which depended on the 
variance of X, and not on X itself.) This is not an issue because X is still being treated 
as a hxed vector. The reason for this choice of will become clear momentarily, but, it 
must be borne in mind that it gets determined after a choice of X is made. 

With this, Z is basically a sum of iV/2 Gaussian random variables Bij, with constant 
coefficients XiXj. Thus, we have that Z ~ A/" (0, (t|), where 



rsj 


XlXlal,^ + ■ ■ ■ + Xl_,Xla 


-^2 I I -^N 

/S? 




x\N/2). 
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Here, similar to the analysis in Section IIB, we have used the fact that ct| is the sum 
of the squares of N/2 normally distributed, zero-mean random variables with unit vari¬ 
ance. Though this has been said several times already, because of the novel nature of this 
treatment, it must be stressed that, up to now, X has been treated as a hxed vector. 

Similar to what we did for the test statistic Y, we now perform an ensemble average of 
Z with respect to the distribution of X. Repeating the calculation that led to (|^, with half 
the number of terms, we have that the distribution of Z is 


P{Z = z) 



K(^,N) 
r {N/Y) 


(9) 


Again, because of the choice of the distribution of the B^ variables, the distribution of 
Z is solely a function of N. This is quite a useful feature for the following reason: Consider 
four random variables i?i, R 2 , R 3 , Ra- Let only Ri and R 3 be correlated, and R 2 and R 4 be 
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correlated: 


{R 1 R 3 ) = {R 2 R 4 :) = e, where e -C 1 


( 10 ) 


Now, say you are testing the null hypothesis that all four variables are mutually independent. 


You come up with two test statistics, Ti := R 1 R 2 + R 3 R 4 and T 2 '■= + From (10), 

it is clear that (Ti) is indistinguishable from that predicted by the null hypothesis, whereas 
(T 2 ) gives a different prediction from the null hypothesis. Of course, the distribution of 
both Ti and T 2 will be different from that predicted by the null hypothesis, but, at least 
for non-pathological distributions, Ti is an 0 (e) worse discriminator for testing the null 
hypothesis. 

If the CMB anomalies are due to correlations amongst the different aim's, from the form 
of (|^, one may naively worry that just like with the RiS, the order in which the aim's appear 
in the equation may matter. That is, instead of the order in ([^, one could alternatively 
consider 

Z' = B 13 X 1 X 3 + 824 X 2 X 4 + ■ ■ ■ + Bi\J- 2 ,nXn- 2 Xi^ 


This is a different statistic from Z. In this manner, there are {N — 1)!! alternatives'^ to Z. In 
principle, each of these combinations will have a different distribution for Z. But, because 
of our choice of the distribution of Bij, we have that the distribution of Z depends only 
on N. With this motivation, let us dehne Perm(Z) as a permutation of the indices in Z 
that ensures that each index appears once and only once. Now, dehne Z as the set of all 
Perm(Z). It is obvious that Z is distributed as (j^. It is Z that is the statistic that we shall 
consider for the rest of this article, though, by abuse of notation, we shall refer to it as Z. 
In this way, the choice of the distribution function for Bij helps us overcome the difficulty 
of having to consider {N — 1)!! different distributions, while ensuring that there is no loss of 
generality in the sequence of indices chosen. 

Finally, we had earlier stated that we would talk about the case with an odd N, which 
arises if we have an even £max- In that case, we can just consider pairs of the hrst {N — 1) 
of the indices of Perm({l ,... ,N}), which occurs in Z anyway. This would mean that we 


^ Consider the sequence {l..A^}. Each index has to occur once. So, there is no freedom in choosing the i in 
(|^. For the j corresponding to the first term, there are {N — 1) possibilities. Again, the i for the second 
term is effectively fixed, as it must appear in the sum. For the j corresponding to this term, there are 
{N — 3) possibilities, and so on. 
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FIG. 2: PDF for for m = 10^ N = 672. 


are losing out on one mode during every permutation, but, the procedure ensures that there 
isn’t any arbitrariness in the choice of that mode. 


B. Hypothesis Testing 


The procedure of testing the null hypothesis Hq is identical to the one we employed for 
the linear test statistic Y. The expected distribution under Hq is given by ([^ and we can 
use actual data to determine the realized distribution. We can then calculate the statistical 
signihcance of a departure from Hq by using the procedure outlined in the previous section. 
Let us denote the probability distribution function for the Anderson-Darling statistic for the 


Z statistic as N, m). We can repeat the procedure outlined in III A to determine this 

PDF—the only change will be that, instead of generating distributions of the Y statistic, 
we will generate distributions of the Z statistic, and, instead of using Q, we shall use (|^. 
For a particular choice of N and m, this PDF is shown in Figure]^ Figures and look 
to be very similar, and we have conhrmed this for other values of f'max- That is, for a given 
f' ma x, the distribution of the A-D statistic is the same for both the Y and the Z statistics. 
The distributions are different for different values of f'max- 


14 









V. RESULTS 


Having discussed the method for testing for the null hypothesis in the previous sections, 
in this section, we demonstrate that the method actually works. To do this, we break one of 
the assumptions in the null hypothesis. The easiest condition to break (in the sense that the 
new probability distribution is easiest to describe) is that of zero-mean. Previous studies [22] 
have looked at relaxing this condition, though they concentrate on somewhat larger values 
of £. They found that, at least in the range of multipoles they considered, the data seemed 
to be consistent with the zero-mean hypothesis. Here, we choose to break the condition of 
independence and normal distribution of the a^^’s, mostly because that is usually posited 
as the reason behind the anomalies. But, we should emphasise that a similar analysis can 
be performed (in fact, more easily) with a non-zero mean. 

Now, there is an inhnite number of ways of breaking the independent, normally dis¬ 
tributed hypothesis |29|. We break it by deliberately masking the hducial CMB sky about 
the equator. This masking breaks statistical isotropy and thus leads to a correlation be¬ 
tween modes. The resulting probability distribution of the is difficult to analytically 
estimate, but it is clear that a greater degree of masking leads to a “bigger” departure from 
the null hypothesis. Then, the strategy behind the demonstration is this: 

1. Generate a set of hducial CMB sky maps from a known set of G^’s. 

2. Generate V and Z statistics using the aim’s of these maps. 

3. Mask these maps to varying degrees and determine the resulting aim’s, and Y and Z 
statistics. 

The method can then be said to work if increasing the masking leads to a bigger departure 
from the null hypothesis (in the sense of the Anderson-Darling test applied to the Y and 
Z statistics). Also, for zero masking, the distribution one gets with the CMB maps must 
correspond to Ty(A^, A^, m) and ^^(A^, A^, m) respectively. 

As mentioned earlier, one of the things that we need to pick is the range of Ys that we 
will be considering. Because we are concentrating on low-£ anomalies, we start with the 
lowest relevant i {i = 2) and go up to an f'max- For the rest of this section, let us choose 
^max = 25. From (|^, this corresponds to N = 672. 

Therefore, what we now need to do is to generate m realizations of Y and Z for each of the 
CMB maps described in the strategy above and compare this distribution with Ty (A^, N, m) 
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(a) No galactic mask (b) 5% of pixels about the equator masked 



(c) 10% of pixels about the equator masked (d) 15% of pixels about the equator masked 

FIG. 3: PDF for \l/y(A^, Af, m) for N = 672, m = 10^, and for different masks. 


for Y and N, m) for Z. We employ rontines in HEALPix® |30] to generate CMB maps 

from a given set of Cis, mask the maps, and then determine the corresponding For the 

Q’s, we nse the Planck best-£t valnes, thongh, becanse this is for testing, any reasonable set 
wonld be sufficient. We consider four sets of maps: unmasked, and a mask of 5%, 10% and 
15% of the pixels about the galactic equator. We choose m = 10®, so that we can compare 
the distribution of the realized vector statistic with that in Figures [T] and We use C++ to 
generate the Y and Z statistics and MATHEMATICA to calculate the A-D statistic. 

For the Y statistic, the results are plotted in Figure As expected, the distribution for 
the unmasked sky [Figure]^ (a)] resembles that in Figure to a very high degree, and the 
other three to a much lesser degree. Clearly, a bigger mask, and thus a bigger departure from 
statistical isotropy (and the null hypothesis), leads to a bigger departure of the distribution 

® http://healpix.sourceforge.net/ 
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(c) 10% of pixels about the equator masked (d) 15% of pixels about the equator masked 


FIG. 4: PDF for iV, m) for N = 672, m = 10® and for different masks. 

from that in Fignre Similar resnlts hold for the Z statistic, plotted in Fignre These 

plots are to be compared with those in Fignre 


VI. CONCLUSION 

The last conple of decades have seen a tremendons amonnt of progress in the nnderstand- 
ing of the large-scale strnctnre of onr Universe. Some parameters have been determined to 
several decimal places and some models have been rnled ont to extremely high signihcance. 
Observationally, the only real challenge to this ACDM paradigm seems to be the large-scale 
CMB anomalies, of which many have been reported. The most important criticism lev¬ 
elled against the anomalies has to do with the fact that the anomalies are an a posteriori 
phenomenon—one tests for anomalies after having “looked” at the data. This is a fair crit¬ 
icism and in this paper we have proposed a method that addresses this very criticism. In 
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a very general manner, we seek to test the null hypothesis that the aim's are independent, 
zero-mean, normally distributed variables with an m-independent variance. 

We consider linear (y) and quadratic {Z) combinations of the aim's, with randomized 
co-efficients. The probability distribution of these co-efficients is of a very specific form, but, 
depends only on the Ci's. This choice greatly simplihes the PDFs of Y and Z. Given a CMB 
map, the Y and Z distribution corresponding to the aim's of the map can be determined. 
This distribution can be compared with the hducial distribution for Y and Z (given in (|^ 
and (|^ respectively) and a high degree of incompatibility between the distributions would 
mean that the data are not well described by the null hypothesis. 

To make this comparison between distributions, we have suggested a very slight modihca- 
tion of the Anderson-Darling test. Of course, other tests could also be used for this purpose. 
In order to demonstrate the usefulness of the test, we generated CMB maps with varying 
degrees of masking in them. This masking breaks statistical isotropy and thus results in a 
departure from the null hypothesis. We demonstrated that, hrstly, for zero masking, the 
distribution of the Anderson-Darling test statistic is what we expect it to be. Secondly, 
increasing the masking did lead to distributions of the Anderson-Darling test statistic that 
were further and further removed from the distribution that arises out of the null hypothesis. 

A few points to note regarding this method are: (i) Like most other “goodness-of-fits” 
tests without an alternative hypothesis, this is a frequentist analysis. In particular, because 
of its very general and stochastic nature, the test may be susceptible to Type II Errors; 
that is, a failure to reject the null hypothesis. If we do have an alternative hypothesis, 
we can then compute the power of the test and make a quantitative statement about the 
probability of Type II errors. Or, indeed, do a Bayesian analysis. In the absence of this 
alternative hypothesis, a p-value compatible with the null hypothesis should not be taken 
to mean that the data indicate that the null hypothesis is true, (ii) In our analysis, we 
have assumed that the Ci's are hxed numbers, but, at least from a Bayesian perspective, 
they themselves are random variables, with an associated variance. We don’t see a way 
around this, because taking into account the stochastic nature of the Ci's would make the 
analysis extremely complicated. Also, recall that the variance of Ci is proportional to Ce 
itself. Thus, for multipoles where the random nature of Ci is most pronounced (that is, the 
lowest of the i's), the value of Ci is large to begin with. This partially alleviates the problem 
associated with assuming that the Ci's are fixed numbers, (iii) Though we have concentrated 
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on using the method to make statements about the it is clear that our method works 

in general for any set of random variables that are hypothesized to be described by Hq. So, 
our method could be used to test Hq in a variety of situations, becoming particularly useful 
when there are only a few realizations of several independent, non-identically distributed 
Gaussian variables. 

In a future publication, we hope to use our method and actual CMB data to quote 
p-values for the departure of the data from the null hypothesis. Planck is soon expected 
to release CMB polarization data, which can easily be incorporated into our analysis and 
should tell us more about the largest scales of the observable universe. 
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